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Power-free values, repulsion between points, differing beliefs 

and the existence of error 

Harald Andres Helfgott 

Abstract. Let / be a cubic polynomial. Then there are infinitely many primes p such 
that f(p) is square-free. 

An integer n is said to be square-free if it is not divisible by any squares other than 1. 
More generally, n is free of kth powers if d G Z,d k \n d = ±1; square-freeness is what 
we get in the case k = 2. Being square-free - or at least free of kth powers for some k - 
is a desirable property: many things are easier to prove for square-free numbers. Thus, 
given a set of integers, it is good to know whether infinitely many of them - or a positive 
proportion of them - are square-fre^H 

Let / be a polynomial with integer coefficients. Is there an infinite number of integers 
n such that /(n) is free of kth powers? There are some polynomials for which the answer 
is clearly "no": say k = 2 and f(n) = 4n or f(n) = n 2 . Slightly more subtly, consider 
f(n) = n(n + l)(n + 2)(n + 3) + 4, which is always divisible by 4. Assume, then, that / 
has no factors repeated k times and that the following local condition holds: 

(*) for every prime p, f(x) ^ modp k has at least one solution in 

(Both conditions are obviously necessary, and both can be checked easily in bounded time.) 
Then, it is believed, there must be an infinite number of integers n such that f(n) is free 
of kth powers. 

If deg(/) < k, it is not hard to prove as much. If deg(/) > k + 1, the problem is too 
hard, at least when k is small. For deg(/) = k + 1, the statement was proven by Erdos in 
1953. In particular, if / is a cubic polynomial without repeated factors and / satisfies the 
local condition (*), then there are infinitely many integers n such that f(n) is square-free. 

Like many proofs in analytic number theory, Erdos's proof is rather tricky, in that it 
uses the fact that most integers are not prime in order to avoid certain essential issues. 



If a technique is strong enough to prove that infinitely many numbers in the bag are square-free, it 
is generally also strong enough to show that a positive proportion are, and even to show which proportion 
are divisible by certain specific squares and no others. This is certainly the case for all the techniques 
discussed here. Results of this strength are necessary for applications in which, for example, we need to 
go from the discriminant of an elliptic curve to its conductor, which is essentially the product of the prime 
factors of the discriminant. See [12] for some general machinery. 
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Perhaps because of this, Erdos asked: for / cubic, are there infinitely many primes p such 
that f(p) is square- free? (More generally, for / of degree k + 1, are there infinitely many 
primes p such that f(p) is free of kth powers?) He conjectured that there are, but, as 
might be expected, most tricks used before break down. 

Hooley ([10], [TT]) proved Erdos's conjecture for k > 51. At about the same time, Nair 
|13j proved the conjecture for k > 7, using rather different methods. (He was also the first 
to treat polynomials with deg(/) > k + 2, k large; Heath-Brown ([4]) has attained further 
progress in this front.) In both approaches, k small is harder than k large; in particular, 
the case k < 6 remained open. 

In [5], I proved Erdos's conjecture for all polynomials / with high entropy; in partic- 
ular, the proof works when k = 2, deg(/) = 3 and Gal(/) = A3. However, most cubic 
polynomials have Galois group S3, and their case remained open until now. 

I have now managed to prove Erdos's conjecture for general cubics. 

Main Theorem. Let f E Z[x] be a cubic polynomial without repeated roots. Assume 
that, for every prime q, there is a solution x G (Z/g 2 Z)* to f{x) ^ modq 2 . Then there 
are infinitely many primes p such that f(p) is square-free. 

In fact, f(p) is square-free for a positive proportion Cf of all primes p - where Cf is 
exactly what we would expect (an infinite product of local densities). 

The tools used, developed and sharpened in the proof are mostly from diophantine 
geometry and probabilistic number theory; there is a key use of the modularity of elliptic 
curves. Let us take a quick walk through the proof. (A full account shall appear elsewhere.) 

It is not hard to show that 

N 

\{P < N : f{p) is square-free}| = Cf ^ N ■ (1 + o(l)) 

+ 0(\{x,y <N,d< N(logN) e : x, y prime, dy 2 = f{x)}\), 

where 15"! is the number of elements of a set S and Cf is the (non-zero) constant we would 
expect. This (well-known) initial step goes roughly as follows. Small square factors can be 
sieved out because we know how many primes there are in arithmetic progressions to small 
moduli; medium-sized square factors amount to a small error term, since 'Yl ( i >rn N/d 2 is 
quite small with respect to N as soon as m is moderately large. Large square factors 
cannot be brushed aside by the same argument as medium-sized square factors simply 
because there are so many of them: an additional term that is overshadowed by N/d 2 in 
the medium range comes to the fore here. This is why the contribution of the large square 
factors figures in (pQ) as the error term within O(-). The sole problem from now on, then, 
is to show that the expression within O(-) is o(N/ log iV). 

As you can tell from the notation, we intend to see this as a problem of bounding the 
number of integer points (x,y) on curves dy 2 = f(x), f a fixed cubic polynomial. The 
issues are two. First, we need very good bounds - almost as good as O(l) for the number 
of points for each d, or at least for every typical d. Second, even a bound of O(l) would 
not be enough! There are N(logN) e curves to consider, and a bound of 0(1) per curve 
would amount to a total bound of 0(N(log N) e ), whereas we need o(iV/ log N). 

Let us begin with the first issue: we want good bounds on the number of integer points 
(x,y), x,y < N, on the curve C<j described by dy 2 = f(x), d fixed. Most techniques for 
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bounding the number of points on curves are based on some kind of repulsion: if there 
are bees in a room, and each bee stays a yard or more away from every other bee, there 
cannot be too many bees in the room. Repulsion may happen in the visible geometry of 
the curve, viz., its graph, as in pQ; such a perspective, unfortunately, would not be nearly 
enough in our case. Alternatively, we may look at repulsion in the Mordell-Weil lattice 
corresponding to the curve. 

Rational and integer points on curves. Let C be a curve of genus g > over Q (or a 
number field K). The curve C can be embedded in its Jacobian Jq- The rational points 
Jc{K) in the Jacobian form a finitely generated abelian group under the group law of the 
Jacobian; they are, furthermore, endowed with a natural norm given by the square root 
of the canonical height. Hence Jc{K) can be naturally embedded in W , where r is the 
rank of Jc(K). We thus have an (almost) injective map 

t : C(K) -4lc E r , 

where C{K) is the set of rational points on C and L is a lattice in R r . What can be said 
about the image l(C(K))7 

If the genus g is 1, l{C{K)) is all of L. However, if g > 1, then l(C{K)) looks quite 
sparse within L. Mumford [Hj proved that the points of l(C(K)) repel each other: for 
any P\,P2 £ l(C(K)) at about the same distance from the origin O, the angle ZP1OP2 
separating Pi from P2 is at least 60° (for g = 2), 70.5° (for g = 3), 75.5° (for g = 4), . . . - 
in general, ZP1OP2 > arccos ~. 

Assume now that the points Pi and P2 come from integer points on C. Then, as I 
pointed out in my thesis ([HI Ch. 4] or Lem. 4.16]; see also the earlier work of Silverman 
|15| . [31 Prop. 5], which seems to have originated from the same observation) the angle 
ZP1OP2 is larger than if Pi and P2 were merely rational: the angle is at least 60° for 
g = 1, 75.5° for g = 2, . . . - in general, at least arccos We obtain better bounds as a 
consequence. (If g = 1, we obtain bounds where Mumford's work does not by itself give 
any.) The bounds are obtained by means of results on sphere-packing; indeed, the number 
of points fitting at a certain distance from the origin and at a separation of > 60° from 
each other is precisely the number of solid spheres that can fit around a sphere of the same 
size in the given dimension. 

In the case of the curve Ed : dy 2 = f(x), the bounds we obtain are of the form c^ d \ 
c\ > 1 fairly small (< 2). This is still not good enough, as, on the average, it amounts to 
about (log N) C2 , C2 a small but fixed non-zero constant, for d ~ N; what we would like is 
a bound of the form (logiV) 6 . 

Visible vs. Mordell-Weil geometry. In [9], we remark upon the following phenomenon. 
Let Pi, P2 be two integer (or rational) points on C{K) at about the same distance from the 
origin. Suppose that their coordinates (xi,yi), (2:2,2/2) are close to each other, either in 
the real place (that is, \xi — y\\ and | — 2/2 1 are small) or p-adically (that is, x\ = X2 modd 
and yi = 2/2 mod d for a large integer d). Then the angle ZP1OP2 in the Mordell-Weil 
lattice is even larger than it would already have to be. In other words: if two points are 
close to each other in the graph of the curve in R 2 , they must be especially far from each 
other in the Mordell-Weil lattice. Thus, if we partition the set of all rational points into 
sets of points close to each other in the graph of the curve, we shall obtain an especially 
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good bound on the number of elements of each such set. The main concern is then to keep 
the number of such sets small. 

In the case of the curve Ed : dy 2 = f(x), we have that any two points (x\,yi), (x 2 , 2/2) 
on Ed induce points Pi = (xijd^-^yi), P2 = (x2,d 1 ^ 2 y2) on E : y 2 = f(x). The y- 
coordinates of Pi and P2 are already close to each other modulo d (that is, modulo the 
prime ideals in Q{d 1/2 ) dividing d). The congruence classes mod d into which x\ and X2 
may fall are rather restricted, as f(x\) = modd and /(a^) = modd; the total number 
of congruence classes x modulo d for which f(x) = mod d is at most 3^ . If Pi and P 2 
have x-coordinates in the same congruence class modulo d, then the angle ZP1OP2 turns 
out to be at least 90°, or 90° — e. Very few points can fit in W lying at about the same 
distance from the origin and subtending angles of 90° — e or more from each other. 

There is the problem that is too large - a power of (log N), on the average, since 
u(d) is usually about log log N. However, with probability 1, an integer d < N has a large 
divisor do (> N 1 ~ e ) with few prime divisors (< e log log N). We may thus consider points 
Pi, P2 congruent to each other modulo do, rather than d, and obtain angles ZP1OP2 of 
size at least 90° — e while considering at most 3^°) possible congruence classes. The 
total bound on the number of integer points (x,y) on Ed '■ dy 2 = f(x) with x,y < N 
is 

0((logiV) e ) f o r 

every typical d, that is, for each d < N outside a set of size at most 

N/ (log N) 1000 . 

This is almost as good as O(l). The problem, as said before, is that this is not good 
enough; since there are N integers d = 1, 2, . . . , N to consider, the total bound would be 
0{N). The issue, then, is how to eliminate most d. Probabilistic number theory has just 
made its first appearance; it shall play a crucial role in what follows. 

The perspective of the value and the perspective of the argument. Our task is still to 
show that 

(2) \{x, y <N,d< N (log N) e : x, y prime, dy 2 = f(x)}\ 

is at most o(N / log N). We have a good bound for each d, namely, 0((log N) € ) for each 
d outside a small set, and a reasonable bound for each d inside that small set. The idea 
will now be to consider, in a solution to dy 2 = f(x), what kind of integer d = f(x)/y 2 
typically is, and whether it looks much like a typical integer d. We will show that, for most 
x, the integer d = f{x)/y 2 must look rather strange, and that thus there can be few such 
d. Stated otherwise: we shall prove that every prime x < N must either lie within a fixed 
set of size o(N/ log N) or be such that, if dy 2 = f{x) for some prime y and some integer 
d < NilogNy, then d must lie within a fixed set of size 0(N/ (log iV) 1+10e ). Combined 
with our bound for each d, this will yield immediately that ([2]) is indeed o(N/ log N). 

What are, then, the ways in which f(x) will tend to be strange for a random prime 
xl And which of those ways of strangeness will carry over to d, if f(x) can be written in 
the form dq 2 , where q is a large prime? 

As far as the second question is concerned: since q is prime, f(x) and d have almost the 
same number of prime divisors. Thus, if we can show that the number of prime divisors 
of f(x) is strange for x random, we will have shown that the number of prime divisors of 
d is strange for x random. 
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Now, for x random, the number of prime divisors w(f(x)) will be about log log N. 
Thus, w(d) will also be about log log N. Unfortunately, this is typical, not strange, for an 
integer of the size of d. 

Consider, however, primes of different kinds. Let K = Q(a), where a is a root of 
f(x) = 0. Then some primes p will split completely in K/Q, some primes will not split at 
all, and some primes may split yet not split completely. We can write wi(n), W2(n) and 
W3(n) for the number of prime divisors of n of each kind. Then, as we shall see, Wj(f(x)) 
(and thus Wj(d)) will tend to be strange for x random. 

Suppose first that K/Q has Galois group A3. Then every prime p must either split com- 
pletely or not split at all. If p does not split at all, then f(x) = modp has no solutions. 
Hence W2(f(x)) = 0, and so 102(d) = 0. This is certainly atypical for an integer d < N. 
(Usually 102(d) ~ | log log 2V.) Now suppose p splits completely. Then f(x) = modp has 
three solutions mod p. Thus, for a random prime x, we shall have f(x) = modp with 
probability 3/p. Hence w 2 (f(x)) will most likely be about £) p sp i its completely § ~ log log 2V. 
Thus W2(d) ~ log log N, whereas an integer d < N usually has W2(d) ~ | log log N. 

It is not enough, however, to show that d is strange (i.e., in a set of size o(2V)); we 
must show that d is strange enough (i.e., in a set of size 0(2V/ (log 2V) 1+e )). How odd is it 
for a random integer d < N to have w\(d) = and W2 (d) ~ log log iV? The number w\(d) 
equals Yl p does not split X p> where X p is a random variable taking the value 1 when p\d and 
otherwise. Similarly, w 2 (d) = J2 P sp iits completely X v Now X p is 1 witn probability 1/p 
and with probability 1 — 1/p. Suppose the variables X p to be mutually independent. 
Then the theory of large deviations (Cramer's theorem, or, more appropriately, Sanoff 's 
theorem) offers the upper bound 

Prob Yl X p = A J2 X p > (l-e)loglogiV I « 

\ p does not split p splits completely / 



(log/V) 1 ^ 3 " 



Now, of course, the variables X p are not actually mutually independent; the variables 
X pi , X p2 , . . . , X Pk can be assumed to be (approximately) mutually independent only when 
P1P2 • • • Pk < X . However, the fact that X p has only a small probability of being non-zero 
allows us to use the main technique from Erdos and Kac's Gaussian paper [2J to show that 
we may treat the variables X p , for our purposes, as if they were mutually independent. 
Thus we do obtain 

ProbK(d) =0 Aw 2 (d) > (1- e)loglogiV) - \ = 0(2V/(log 2V) 1+e "), 

(log J\l y°s-i~ e 

as desired. 

We are done proving the main theorem when Gal/ = A3. What happens when Gal/ = 
S3? While our analysis is in the main still valid, the exponent that we obtain instead of 
log 3 is ^log3, which is less than 1, and thus insufficient. (In general, for dy k = f(x), 
d e g(/) = k + 1, the exponent we get may be expressed as an entropy, which will depend 
on Gal(/) alone. Sometimes entropy (Gal(/)) > 1, and we are done, and sometimes, as in 
the case of Gal(/) = S3, the entropy is < 1.) 

The reason is that, when Gal/ = S3, half of the primes split in K/Q. These primes 
divide f(x) with exactly the same probability that they would divide a random integer, 
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and thus they are useless. What is to be done, then? How can one bridge a gap of size 
l/tlogiV) 1 -^ 3 ? 

The existence of error. Modularity. Again: what is a way of strangeness such that, if 
f(x) is strange and d = f(x)/q 2 , q a prime, then d must be strange as well? Having too 
few or too many prime factors of some kind is one way. Is there another one? 

We have used the fact that q 2 has only one prime factor; let us now use the fact that 
q 2 is a square. For any prime modulus p, the integer d will be a square mod p if and only 
if f(x) is a square mod p. Now, a random integer is as likely to be a square mod p as a 
non-square mod p. How likely is f(x) to be a square mod p for a random integer x (or a 
random prime x)l 

By the Weil bounds, there are p+0{^fp) points on the curve y 2 = f(x) mod p. Hence 
the probability that f(x) will be a square mod p is \ + 0(jT x l 2 ). This is not good, as \ 
would be the probability if there were nothing amiss to be exploited. Let us show that an 
error of size about p~ 1 / 2 is in fact present a positive proportion of the time. 

Write the number of points on the curve y 2 = f(x) mod p as p + 1 — a p . Then the 
probability that f(x) will be a square modp is precisely \ — ^ + 0(l/p); we have to give 
a lower bound, on the average, for the size of \a p \/2p (or, rather, a 2 /p 2 , since we shall 
later use a variance bound). Now, the .L-function of E : y 2 = f(x) is ^a n n~ s . By the 
modularity of elliptic curves (proven by Wiles et al.), there is a modular form g associated 
to L. We may, in turn, define a Rankin-Selberg L-function L g ^ g associated to g, and use 
the standard facts that L g ^ g = ^ a^n _s and that L g ® g has a simple pole at s = 2. By 
some Tauberian work (or proceeding as in the proof of the prime number theorem) we 
deduce that J2 P <z \ a p\ 2 /P 2 ^ s asymptotic to log log z; in other words, a 2 is of size about p 
on the average. 

It is somewhat unpleasant to have to use modularity here, as we need not know the 
behaviour of Le (or L g % g ) inside the critical strip. Still, it is hard to see how to do 
without modularity or some strong kindred result. Marc Hindry and Mladen Dimitrov 
have pointed out to me that, if one wants to give a (conditional) statement on fcth-power- 
free values of polynomials of degree k + 1, k > 2, it may be simpler and more proper to 
work assuming Tate's conjecture on LcxC rather than automorphicity. 

Using many small differences. Exponential moments and high moments. Now, how 
may we use these small differences between the probability of d being a square (for d a 
random integer) and the probability of d being a square (for d = f(x)/q 2 , x a random 
prime)? 

Suppose I am throwing a fair coin in the air. A gentle wind blows; it may change 
directions very often, but becomes gradually milder. I know that the wind has a slight 
effect on the way the coin lands: if the wind blows from the east, then, I posit, heads are 
more likely, whereas, if the wind blow from the west, tails are more likely. You, however, 
will not believe me. How shall I make my point? 

Let us assume I can measure the strength and direction of the wind before every coin 
throw. I shall throw the coin in the air many times, betting on heads or tails according to 
what I reckon to be more likely, given the wind. If, at the end, I have collected statistically 
significant winnings, you will have to acknowledge that I am in the right. 
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Our situation is analogous. Instead of wind, we have a p ; instead of a coin, we have 
whether or not f(x) is a square mod p for a random prime x. (The prime x stays fixed as p 
varies.) If f(x) (and thus d) lands on the more likely side of squareness or non-squareness 
for significantly more than one-half of all primes p, then d will be sufficiently strange. 

We can let X p be a random variable taking the value Z ^ 3L when f(x) is a square mod 

p, and the value when fix) is a non-square mod p. Then X p is — 2 with probability 
I ~~ fp' an< ^ with probability § + It can be seen easily that the expected value of 
Y^ P < z Xp is J2 p <z a p/P 2 ~ log log z. We may assume pairwise independence and obtain 

„2 „4- 



that Var(E p <,X p ) = £ P <, (j£ - ~ log log z - 0(1) ~ log log z. Thus, Chebyshev 
gives us that ^2 p < z X p is > (1 — o(l)) log log z a proportion 1 of the time. Now let Y p be 
— ^ when a random integer d < x is a square residue modulo p; let Yp be ^ otherwise. 
What is the probability that Ylp<z ^ e larger than (1 — o(l)) log log z? 

The probability that Y p take either of its two possible values is 1/2. Suppose that the 
variables Y p were mutually independent. Then the expected value K(e^ Yp ) of e^ Yp would 
be the product of the expected values of e Yp . We can use this as follows. First of all, 



Prob(^Yp > (1 -o(l))loglogz) < Prob(e E f><* yp > (log z) l ~°^) < 

p<z 



(logz) 1 -^ 1 ) 



Now, as we were saying, 





_ gE p < z J§ _ e (l+o(l)) log log z _ (Jog^^+oC 1 ) 

Hence 

(3) Prob(V Yp > (l-o(l))loglogz) < 



^ ' 4 x " - ~ ' -- (log 2 )l/2-o(l)' 

which is the bound we desire. 

Now, the variables Y p are not in fact mutually independent, and, since the probabilities 
we are dealing with are close to 1/2 rather than to 0, we cannot apply the tricks in 

Erdos-Kac. A simpler approach will in fact do. Let z = N 31 °s l °s N and k = ^ log log z. 
Then, while the variables Y p are not mutually independent, they are more than pairwise 
independent: any 2k of them are mutually independent (with a small error term). We 
can thus proceed as in the proof of Chebyshev's theorem, taking a (2A;)th power instead 
of a square. The bound thus obtained is essentially as good as ([3]): we obtain that the 
probability that ^2 p < z Yp/\/p be larger than (1 — o(l)) log log z is 0(1/ (log z) 1 / 2 " ^) = 
O(l/(logA0 1/2 -° (1) )7 
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We conclude that, if d is strange in the two ways we have considered - having numbers 
of prime divisors differing from the norm, and "agreeing with the wind" for considerably 
more than half of all p's - then it lies in a set of cardinality at most 



(|2|). Thus ([2]) is indeed at most o(N/ log iV), and we are done proving the main theorem. 
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